-
Notifications
You must be signed in to change notification settings - Fork 374
[RFC ]Initial gRPC protocol for agent communication #2
Conversation
pkg/agent/api/grpc/hyperstart.proto
Outdated
|
||
message CreateContainerRequest { | ||
Container container = 1; | ||
Process init = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls replace the 2 fields above which container_id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we need the configuration of the process, Or has it already defined by the OCI spec?
pkg/agent/api/grpc/hyperstart.proto
Outdated
string type = 1; | ||
uint64 hard = 2; | ||
uint64 soft = 3; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the above Container/Mount/Process/Rlimit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename User to StringUser.
StringUser is needed since the oci spec don't allow string user ids.
pkg/agent/api/grpc/oci.proto
Outdated
string Version = 1; | ||
|
||
// Process configures the container process. | ||
OCIProcess Process = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the OCI prefixes for OCIProcess, OCIMount, OCIUser...
based on hyperstart gRPC protocol and the discussion in hyperhq/runv#628 Signed-off-by: Wang Xu <gnawux@gmail.com>
updated based on the above comments |
@gnawux You need to regenerate the |
} | ||
|
||
message ExecProcessRequest { | ||
string container_id = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add StringUser here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@laijs This would be redundant with the Process.User
field, wouldn't it ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quote from comments by @laijs above
StringUser is needed since the oci spec don't allow string user ids
correct this if anything wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, to make container boot quickly, avoiding mounting the root block device on the host is very important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sboeuf That usename is used for windows only: opencontainers/runtime-spec@f9e48e0
It is a little weird if we reuse this variable. @gnawux do you accept it ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can always mount the block device inside the container to access to the /etc/passwd
.
But we avoid do mount the block device in the host (for accessing to the /etc/passwd
and filling the numerical user id in the spec).
-
the host might not have the ability to mount it. for example, if the block device is from ceph, and the host doesn't have krbd.ko. And even the host has it, userspace ceph lib + qume are alway the best choice.
-
Security, if the block device was once mounted inside the vm, the host should not mount it. the code in the vm might hack into the guest kernel, and modify the metadata of the filesystem of the block device. The host kernel might be also broken into when mounting the block device.
-
performance: to speed up the starting of the container, hyper avoid mounting the block device on the host.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@laijs not sure I follow everything here because I thought that /etc/passwd
was passed through 9pfs
through the shared directory. And in case /etc/passwd
comes from the block device passed through the VM, it is mounted somewhere, meaning that you can specify libcontainer that you expect /etc/passwd
to be mounted, right ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@laijs just to be clear, I am not saying that I want to mount the block device on the host, I agree it would take some time. We don't do that either.
But the block device needs to be mounted inside the guest by the agent and then we can provide libcontainer with the right mounts so that it will mount /etc/passwd
in the right place. This way, it would be able to know what 1000:1000
means for the user/group.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right.
Runtime --> agent --> libcontainer.
Runtime has to pass the string user name to agent if string name is specified by upper layer tools. Agent parses the string name from /etc/passwd
and pass the digit user id to the libcontainer.
"Runtime --> agent" is the API, so we need the string user name in the API.
docker-runc does't allow the string user name, so the docker/moby does the parsing of the username. kata runtime cli will not allow the string user name either. But kata runtime can also be a lib for hyperd/frakti, the string user name will be passed to the agent in this case if requested.
rpc AddRoute(AddRouteRequest) returns (google.protobuf.Empty); | ||
rpc OnlineCPUMem(OnlineCPUMemRequest) returns (google.protobuf.Empty); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
message Storage {
string driver = 1; // empty in most cases. it will support "drbd", "bcache" ...
string source = 2; // "/dev/sdb", "/dev/disk/by-scsi/xxxx", "none",...
string fstype = 3; //"xfs", "ext4" etc. for block dev, or "9p" for shared fs, or "tmpfs" for shared /dev/shm for all containers ...
repeated string options = 4;
string mount_point = 5; // mount_point is only visible by VM, not by containers. This mount point can be used on oci.Mount.Source as "/Stroage/mount/point/{rootfs|_data}".
}
I think this Storage + oci.Mount can reassemble current hyperstart's storage model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@laijs Would you mind describing use cases for this Storage
payload?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sameo In short, for the backward compatibility to current hyperd/hyperstart.
The Storage
derives from the volumes of hyperd and runv, in which we could insert a block device (the source
, loop device or ceph rbd is ok as well) as a volume of container (as filesystem fstype
, to the mount_point
).
According to assumptions of the WIP CSI, the block devices would be inserted as block devices instead of mounted filesystems, i.e., for most current container images, there need a k8s init container to mount the block devices before the devices could be accessed.
The Storage
struct here could be think as a shortcut for the volume init container behavior because the block devices are much more common in a virtualized world. Then we don't need run a volume init container before launch a container, especially for the case not working with k8s.
Any additional explanation, @laijs ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gnawux explained it in high level view. Here I focus on low level view:
We need the initial storage config for mounting the 9p which can be configured by this Storage in StartSandboxRequest.
oci.Spec can't ask you "mount /dev/sda1 mount to /kata/storage/xxxx and use /kata/storage/xxxx/rootfs as container rootfs".
oci.Spec can't ask you "mount /dev/sdd mount to /kata/storage/yyy and use /kata/storage/xxxx/_data as one of the container volume".
oci.Mount can't ask you mount a tmpfs on /kata/storage/shm, and bind it to all container's /dev/shm.
and more ....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@laijs I was about to ask for a shareDir
parameter for StartSandbox
since we need a way to pass some file and rootfs through 9pfs. Your Storage
structure looks good because it is a generic way to pass some things to the VM through any type of filesystem that need to be shared accross all containers, and that are not hotpluggable actually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, to give some more details here, we cannot completely rely on block devices to pass rootfs to the VM. In case of overlay
, we don't have a block device and I don't think we want to spend a large amount of time preparing a block device based on this, it would consume too much time. Also I am not sure how it would work to reflect the changes on the block device on the overlay layers.
That's why having a mount point that can be shared when starting the VM would be the best option to support multiple cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@laijs we have renamed the driver
field to format
in runV, and the content should be stuff like raw
, qcow2
, dir
... we don't mind it is bcache
device or a loop file if they could be treat as a block device.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use block devices for passing rootfs to the VM in several cases. Host storage drivers such as devmapper, (cow-)rawblock, ceph(hyper.sh) fit well into runv/hyperstart. In these cases, block devices containing container rootfs or volume are hotplug into the VM. In near future, DAX is going to be supported which also relies on block devices.
Only when the host storage drivers are overlayfs/aufs/btrfs, 9pfs is used for passing rootfs.
So both ways(block&sharedfs) are supported. @sboeuf
} | ||
|
||
message CreateContainerRequest { | ||
string container_id = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+ repeated Storage
+ StringUser
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed with repeated Storage
.
|
||
message StartSandboxRequest { | ||
string hostname = 1; | ||
repeated string dns = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+ repeated Storage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@laijs Yes, we need that one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we need it in both Create and Start API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need it for the StartSandbox
for storage endpoints that are shared across all containers and in CreateContainer
for container specific ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here StartSandbox
is actually a create and start operation. But I'm fine if we rename it to CreateSandbox()
. It's also consistent with DestroySandbox()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the "Sandbox" has some runtime semantics and it is not persisted, which could be created as running, paused, or destroyed, and there should not be a created and stopped sandbox in our context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gnawux Right. So what I'm saying is that I'm fine if we only have:
rpc CreateSandbox(CreateSandboxRequest) returns (google.protobuf.Empty);
rpc DestroySandbox(DestroySandboxRequest) returns (google.protobuf.Empty);
because in our case we're only going to create and destroy sandboxes.
Are you suggesting we should have:
rpc CreateSandbox(CreateSandboxRequest) returns (google.protobuf.Empty);
rpc StartSandbox(StartSandboxRequest) returns (google.protobuf.Empty);
rpc StopSandbox(StopSandboxRequest) returns (google.protobuf.Empty);
rpc DestroySandbox(DestroySandboxRequest) returns (google.protobuf.Empty);
?
@sameo I think I did, but I found someting different from the code you generated, which version of protoc are you using? And did you generated the |
# | ||
# SPDX-License-Identifier: Apache-2.0 | ||
# | ||
protoc --proto_path=pkg/agent/api/grpc --go_out=plugins=grpc:pkg/agent/api/grpc pkg/agent/api/grpc/hyperstart.proto pkg/agent/api/grpc/oci.proto |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you also want to add a -I=$GOPATH/src/github.com/google/protobuf/src/
to this command. Without it, protoc seems to be unable to find the Empty
message:
$ hack/update-generated-agent-proto.sh
google/protobuf/empty.proto: File not found.
hyperstart.proto: Import "google/protobuf/empty.proto" was not found or had errors.
hyperstart.proto:15:62: "google.protobuf.Empty" is not defined.
hyperstart.proto:16:60: "google.protobuf.Empty" is not defined.
hyperstart.proto:17:54: "google.protobuf.Empty" is not defined.
hyperstart.proto:18:58: "google.protobuf.Empty" is not defined.
hyperstart.proto:25:52: "google.protobuf.Empty" is not defined.
hyperstart.proto:26:56: "google.protobuf.Empty" is not defined.
hyperstart.proto:29:56: "google.protobuf.Empty" is not defined.
hyperstart.proto:30:60: "google.protobuf.Empty" is not defined.
hyperstart.proto:31:62: "google.protobuf.Empty" is not defined.
hyperstart.proto:32:48: "google.protobuf.Empty" is not defined.
hyperstart.proto:33:56: "google.protobuf.Empty" is not defined.
Am I missing something ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
➜ runtimes git:(grpc_proto) protoc --version
libprotoc 3.5.0
➜ runtimes git:(grpc_proto) hack/update-generated-agent-proto.sh
➜ runtimes git:(grpc_proto)
Works on my Mac.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sameo Ah I know why you got the Empty
errors.
My protoc
is built from source and installed together with the include files.
If you download the prebuilt protoc
, and only put the bin/protoc
binary in your $PATH
without install the include/
dir, you will get the import error.
However, I have not generated the code with the getters in the diff yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gnawux I installed protoc from my distro protobuf packages (FC27, protobuf
and protobuf-compiler
)
$ rpm -q protobuf-compiler
protobuf-compiler-3.3.1-2.fc27.x86_644
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sameo Then does it have an -devel
or -dev
like package?
And we need figure out why they generate different code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
finally I solved the getter issue. I found an older protoc-gen-go
in a higher priority position in my $PATH
@gnawux I'm using:
And when generating the pb.go files with the hack/ script, I get the following diff https://gist.github.com/sameo/acb737c5370957afb96e5e7d905d1b24 |
@sameo I found the diff before I post the PR as well, and I checked many things but didn't get what's missing... |
I think it seems better to create a new repo for the api, such as agent-api. Or move this api into the agent repo. |
@gnawux Let's use the agent repo as it's the one that makes the most sense for now. |
Then will close this PR after update it to the agent repo |
based on hyperstart gRPC protocol and the discussion in hyperhq/runv#628 and kata-containers/runtime#2
based on hyperstart gRPC protocol and the discussion in hyperhq/runv#628 and kata-containers/runtime#2 Signed-off-by: Wang Xu <gnawux@gmail.com>
based on hyperstart gRPC protocol and the discussion in hyperhq/runv#628 and kata-containers/runtime#2 Signed-off-by: Wang Xu <gnawux@gmail.com>
based on hyperstart gRPC protocol and the discussion in hyperhq/runv#628 and kata-containers/runtime#2 Signed-off-by: Wang Xu <gnawux@gmail.com>
close this PR and moved the protocol code to kata-containers/agent repo |
…iles scripts: Remove scripts for now.
virtio-mmio: Add support for virtio-mmio
sync fork after merge clh driver #1
based on hyperstart gRPC protocol and the discussion in hyperhq/runv#628 .
and there are a few guys think it is better to put the protocol in an independent repo, what's your opinion?